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BAYESIAN DETECTION OF IMAGE BOUNDARIES 
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Duke University and North Carolina State University 

Detecting boundary of an image based on noisy observations is a 
fundamental problem of image processing and image segmentation. 

For a d-dimensional image (d = 2,3,...), the boundary can often 
be described by a closed smooth (d — l)-dimensional manifold. In 
this paper, we propose a nonparametric Bayesian approach based on 
priors indexed by the unit sphere in We derive optimal 

posterior contraction rates for Gaussian processes or finite random 
series priors using basis functions such as trigonometric polynomials 
for 2-dimensional images and spherical harmonics for 3-dimensional 
images. For 2-dimensional images, we show a rescaled squared ex¬ 
ponential Gaussian process on achieves four goals of guaranteed 
geometric restriction, (nearly) minimax optimal rate adapting to the 
smoothness level, convenience for joint inference and computational 
efficiency. We conduct an extensive study of its reproducing kernel 
Hilbert space, which may be of interest by its own and can also be 
used in other contexts. Several new estimates on the modified Bessel 
functions of the first kind are given. Simulations confirm excellent 
performance and robustness of the proposed method. 

1. Introduction. The problem of detecting boundaries of image arise 
in a variety of areas including epidemiology [47], geology [29], ecology [15], 
forestry, marine science. A general d-dimensional {d > 2) image can be 
described as , where Xi £ T = [0,1]'^ is the location of the ith 

observation and U is the corresponding pixel intensity. Let be a given 

regular parametric family of densities with respect to a cr-finite measure v, 
indexed by a p-dimensional parameter 4> £ Q, then we assume that there is 
a closed region L C T such that 



/(•;e) ifXiGL; 
fi-,p) ifAiGL'^ 


where p are distinct but unknown parameters. We assume that both L and 
L'^ have nonzero Lebesgue measures. The goal here is to recover the boundary 
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7 = 9r from the noisy image where 7 is assumed to be a smooth (d — 1 )- 
dimensional manifold without boundary, and derive the contraction rate of 
7 at a given true value 70 in terms of the metric defined by the Lebesgue 
measure of the symmetric difference between the regions enclosed by 7 and 
70 - When the boundary itself is of interest such as in image segmentation, 
we can view the problem as a generalization of the change-point problem in 
one-dimensional data to images. 

A significant part of the literature focuses on the detection of boundary 
pixels, based on either first-order or second-order derivatives of the under¬ 
lying intensity function [35, Ch. 6 ] or Markov random fields [16, 17], result¬ 
ing in various edge detectors or filters. This approach is especially popular 
in computer vision [4, 5]. However, the detected boundary pixels are scat¬ 
tered all over the image and do not necessarily lead to a closed region, and 
hence cannot be directly used for image segmentation. A post-smoothing 
step can be applied, such as Fourier basis expansion, principal curves [21] 
or a Bayesian multiscale method proposed by [19]. However the ad-hoc two- 
step approach makes the theoretical study of convergence intractable. In 
addition, as pointed out by [3], many applications produce data at irregular 
spatial locations and do not have natural neighborhoods. 

Most existing methods are based on local smoothing techniques [ 6 , 41, 20, 
34, 37], which lead to convenient study of theoretical properties benefiting 
from well established results. However, local methods suffer when the data 
is sparse and thus the usage of the global information becomes critical. More 
importantly, it often leads to local (or pointwise) inference such as marginal 
conhdence bands losing the joint information. 

A relevant and intensively studied problem is to estimate the underlying 
intensity function E(yjA) with discontinuity at the boundary [32, 39, 12, 
20, 36, 38]. These two problems are different for at least two reasons. Firstly, 
there are many important applications where ^ and p affect /(•) not (or not 
only) in the mean but some other characteristics such as variance [ 6 ]. Sec¬ 
ondly, the reconstruction of E(Tj Ai) is essentially a curve (or surface) fitting 
problem with discontinuity and the corresponding asymptotics are mostly 
on the entire intensity function rather than the boundary itself. Therefore, 
we may refer the latter as image denoising when boundaries are present, not 
necessarily guaranteeing the geometric restrictions on the boundary such as 
closedness and smoothness. 

In this paper, we propose a nonparametric Bayesian method tailored 
to detect the boundary 70 , which is viewed as a closed smooth {d — 1 )- 
dimensional manifold without boundary. This paper has three main contri¬ 
butions. 
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The first main contribution is that the proposed method is, to our best 
knowledge, the first one in the literature that achieves all the following four 
goals (i)-(iv) when estimating the boundary. 

(i) . Guaranteed geometric restrictions on the boundary such as closedness 

and smoothness. 

(ii) . Convergence at the (nearly) minimax rate [26, 31], adaptively to the 

smoothness of the boundary. 

(iii) . Possibility and convenience of joint inference. 

(iv) . Computationally efficient algorithm. 

To address (i) and (iii), the Bayesian framework has its inherent advantages. 
For (i), we note that Bayesian methods allow us to put the restrictions on the 
boundary conveniently via a prior distribution. Specifically, we propose to 
use a Gaussian process (CP) prior indexed by the unit sphere in i.e. the 
(d—l)-sphere = {x = (xi,..., x^) G : x\ + - ■ -Tx^ = 1}, or a random 
series prior on For (iii), Bayesian methods allow for joint inference since 
we draw samples from the joint posterior distributions, as demonstrated 
by the numerical results in Section 6. The proposed method achieves the 
(nearly) minimax optimal rate adapting to the unknown smoothness level 
based on a random rescaling incorporated by a hierarchical prior [46, 42]. 
Furthermore, Goal (ii) is achieved for any regular family of noise and general 
dimensions. In contrast, for instance, the method in [31] is presented only 
for binary images and does not adapt to the unknown smoothness level. 
Although the quantification of uncertainty and adaptivity of a method is 
appealing, the computation in goal (iv) is important when implementing 
it. Many adaptive methods are hard to implement since inverses of covari¬ 
ance matrices need to be calculated repeatedly. In the proposed Bayesian 
approach, an efficient Markov chain Monte Garlo (MGMC) sampling is de¬ 
signed based on the analytical eigen decomposition of the squared exponen¬ 
tial periodic (SEP) kernel (see Section 5), for various noise distributions. In 
addition, we conduct extensive numerical studies to confirm the good per¬ 
formance of the proposed method and indicate that it is robust under model 
misspecification. 

As the second main contribution, we conduct an extensive study on the re¬ 
producing kernel Hilbert space (RKHS) of the SEP Gaussian process, which 
is essential to obtain the optimal rate and adaptation in Goal (ii). Eor the 
most important case in applications d = 2, by a simple mapping, the squared 
exponential (SE) Gaussian process on is equivalent to the SEP Gaussian 
process on [0,1] since their RKHS’s are isometric (see Lemma 4.1). Recently 
developed theory of posterior contraction rates implies that nonparametric 
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Bayesian procedures can automatically adapt to the unknown smoothness 
level using a rescaling factor via a hyperparameter in a stationary Gaussian 
processes on [0, 1] or [0,1]'^ [44, 46]. Rescaled SE Gaussian process is one 
popular example of this kind. In contrast, the literature lacks results on the 
rescaling scheme and the resulting properties of the SEP Gaussian process, 
even though it has been implemented in many applications [30] . It may due 
to the apparent similarity shared between the SEP Gaussian process and 
the SE Gaussian process. However, these two processes have fundamental 
differences because the rescaling of the argument on cannot be trans¬ 
formed as a rescaling of the mapped argument on the Euclidean domain. In 
addition, the spectral measure of the SEP Gaussian process is discrete (see 
Lemma 4.2) thus lacking the absolute continuity of that of the SE Gaussian 
process which is critical in establishing many of its properties [46]. As a re¬ 
sult, the RKHS of the SEP Gaussian process for different scales do not follow 
the usual nesting property. We overcome these issues by using the special 
eigen structure of the SEP kernel and intriguing properties of the modified 
Bessel functions of the first kind. Some of the properties of the SE Gaussian 
process still hold, however, the proofs are remarkably different. Nevertheless, 
we show that the posterior contraction rate of the boundary by using the 
SEP Gaussian process is nearly minimax-optimal, which is up to 

a logarithmic factor, adaptively to the smoothness level a of the boundary. 
Section 4 establishes a list of properties on the RKHS of the SEP Gaussian 
process, along with the contraction rate calculation and adaptation. 

The third main contribution is that we provide some new estimates on 
Bessel functions, which are critical when establishing properties on the 
RKHS of the SEP Gaussian process. Similar to the second main contri¬ 
bution, these new estimates may be of interest by their own and are useful 
in broader contexts such as function estimation on spheres in addition to 
the boundary detection problem discussed here. 

In addition to establishing key theoretical properties, we also develop an 
efficient MGMG method for sampling posterior distribution based on a SEP 
Gaussian process prior using the explicit eigen structure of the SEP Gaussian 
process obtained in this paper (taking 0(n) time in each MGMG run). The 
algorithm is generic and hence can be used for posterior computation in other 
curve estimation problems on the circle such as directional data analysis 
using the SEP Gaussian process prior. 

The paper is organized as follows. Section 2 introduces the model and 
notations. The general results on the posterior contraction rate are given 
in Section 3, along with examples of priors and posterior rate calculation 
including a finite random series prior (for d = 2 and 3) and the squared 


BAYESIAN DETECTION OF IMAGE BOUNDARIES 


5 


exponential Gaussian process prior on (for d = 2). In Section 4, we study 
the corresponding RKHS of a squared exponential Gaussian process prior on 
, or equivalently, a squared exponential periodic Gaussian process on [0,1], 
heavily relying on the properties of modified Bessel functions of the first 
kind. Section 5 proposes an efficient Markov Chain Monte Carlo methods 
for computing the posterior distribution of the boundary using a randomly 
rescaled Gaussian process prior, for various noise distributions. Section 6 
studies the performance of the proposed Bayesian estimator via simulations, 
under various settings for both binary images and Gaussian noised images. 
Section 7 contains proofs to all theorems and lemmas. Section 8 provides 
several results on the modified Bessel functions of the first kind. 

2. Model and Notations. We consider a d-dimensional image 
for d = 2,3,..., where Xi is the location of the fth observa¬ 
tion and Yi is the image intensity. We consider the locations within a d- 
dimensional fixed size hypercube, and we specifically use unit hypercube 
T = [—1/2,1/2]'^ without loss of generality. Depending on the scheme of 
collecting data, we have the following options for the distribution Px^ of Xi : 

• Completely Random Design. Xi Uniform(T). 

• Jitteredly Random Design. Let Tj be the zth block when partitioning 
T into equal-spaced grids. Then Xi is chosen randomly at Ti, i.e. Xi ~ 
Uniform(rj) independently. 

The region T is assumed to be star-shaped with a known reference point 
O G r, namely, for any point in T the line segment from O to that point is 
in T; see [ 12 ] and [26, Ch5]. If the image is start-shaped, this assumption is 
mild since in general a reference point can be easily detected by a prelimi¬ 
nary estimator of the boundary or it could be directly given in many cases 
according to some subject matter knowledge. Images of more general shape 
can possibly be addressed by and ad-hoc “divide and conquer” strategy. The 
boundary 7 = ST is a (d— l)-dimensional closed manifold. In view of a con¬ 
verse of the Jordan curve theorem we represent the closed boundary 7 as a 
function indexed by i.e. 7 : —)• M"*" : s —)• 7 ( 5 ). We further assume 

that the boundary 7 is a-smooth, i.e. 7 G C“(S'^“^), where is the 

a-Holder class on Specifically, let oq be the largest integer strictly 

smaller than a, then 

= {/ : M+, < Lf\\x - y|r"“° 

for Vx,?/ G and some Lf > 0}, 
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where || • || is the Euclidean distance. A different definition of smoothness was 
used by [31] based on the class of sets in [13], which cover cases of unsmooth 
boundary but with smooth parameterization. Here we focus on the class of 
smooth boundary therefore it may be more natural to use the definition of 
(j^a(gd-i) (directly. It may be noted that in our set-up, the boundary is not 
affected by reparameterization. 

We use 9 to denote the triplet (^,/ 0 , 7 ). Let 4>i be the parameters at the 
ith. location, i.e. = i^{Xi G E) -|- pll(Aj G L'^) where ll(-) is the indicator 
function. The model assumes that Y\X ~ Pg for some 9, where Pq has 
density \Vi=iPe,i{yi) = OiLi with respect to Let 

i 

be the average of the squares of the Hellinger distance for the distribu¬ 
tions of the individual observations. Let K{f,g) = f f log{f /g)di', V{f,g) = 
f f\ log{f /g)\‘^diy, and || • ||p denote the Lp-norm (1 < p < 00 ). We use f < g 
if there is an universal constant C such that / < Cg, and / i< 5 'if/^fi'^/. 
For a vector x G M'^, define ||x||p = and ||x||oo = maxi<j<p |xj|. 

For two sets F and F', we use F A F' for their symmetric difference and 
A(F A F') for its corresponding Lebesgue measure. We also use A( 7 , 7 ') for 
A(F A F') when 7 = 5F and 7 ' = dT'. 

3. Posterior convergence. In the following sections, we shall focus on 
the jitteredly random design; the completely random design is more straight¬ 
forward and follows the same rate calculation with minor modifications. 

3.1. General theorem. The likelihood function is given by 

L{Y\x,9) = H 

i£li i£l2 

where A = {i : Aj G F} and I 2 = {i : Aj G F'^}. The parameters (^, p) G 0*, 
where 0* is a subset of 0 x 0 = {(^,p) : ^ G 0,p G 0}. The set 0* is 
typically given as the full parameter space 0x0 with some order restriction 
between ^ and p. For instance, when /(•) is the Bernoulli distribution, then 
0* = {(.^, p) G : 0 < p < ^ < 1} if the inside probability ^ is believed to 
be larger than the outside probability. We assume that the distribution /(•) 
has the following regularity conditions: 

(Al). For fixed (/)o, we have A(/(-; 4),/(•;'^)) ^ ll</> - M? and 

V{f{-'Ao)J{-'A)) ^ as ||(/)-4||2 0 ; 
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(A2). There exist constants Cq, bo > 0 such that for (pi, (p 2 with ||i;Ai||, \\4>2\\ < 
M, we have /i2(/(-; 0i),/(•; </' 2 )) < <^ 0(1 + where 

h{f{-](j)),f{-;(p')) is the Hellinger distance between the two densities 
/(•;</>) and /(•;</>')• 

Assumptions (Al) and (A 2 ) relate the divergence and distances between 
two distributions to the Euclidean distance between the corresponding pa¬ 
rameters. Most common distributions where the parameters are bounded 
away from the boundary of their supports satisfy these two assumptions, 
particularly including all the distribution families discussed in the paper. 

The observations l^’s are conditionally independent given parameters. In 
the following sections, we let Oq denote the true value of the parameter vector 
(? 0 ) PO) 7 o) generating the data, and the corresponding region with boundary 
7 o is denoted by Tq. 

We shall denote the prior on 9 by IT. By a slight abuse of notations, we 
denote the priors on , p) and 7 also by IT. We next present the abstract 
forms of the required prior distributions in order to satisfy the minimax- 
optimal posterior contraction rate later on. The prior on (^, p) is independent 
with the prior on 7 and satisfies that 

(Bl). n(^, p) has a positive and continuous density on 0*; 

(B2). Sub-polynomial tails: there are some constants ti,t 2 > 0 such that for 
any M > 0, we have n(^ : ^ ^ [—M,M]p) < tiM~^^ and Yi{p : p ^ 
[-M,Mf) < tiM-^K 

The estimation and inference on 7 is of main interest. Therefore p) are 
considered as two nuisance parameters. When 7 is modeled nonparametri- 
cally, the contraction rate for 9 is primarily influenced by 7 . The following 
condition is critical to relate dn{9,9') to A( 7 , 7 '), which will lead to the 
contraction rate for 7 . 

(C). For given (CoiPo) £ there exists a positive constant co,n such that 
for arbitrary (^, p) € 0 *, h{f{-;^o),f{-;p))+h{f{-;po),f{--, k)) > co,n > 

0 . 

Above we allow the constant co,n, which is usually h{^o, po), to depend on 
n if we consider a sequence of true values {^o,po)- Assumption (C) can be 
interpreted as quantifying the separation of the inside and outside densities 
in terms of the Hellinger distance. The separation becoming smaller with n 
indicates the increasing level of difficulty of the problem. Assumption (C) 
holds for most commonly used distribution families {f{-;(f>);(f> G 0 } when 
0* considers the order restriction between ^ and p. Examples include but 
are not limited to: 
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• One-parameter family such as Bernoulli, Poisson, exponential distri¬ 
butions, and 0 * = 0 ^ n {(^, p) : p < ^}, or 0 * = 0 ^ 0 {(^, p) : p > 

• Two-parameter family such as Gaussian distributions, and 0* = 

02 n {((/ii,cri), (^2,0-2)) ; Pi > P2,(^i = 0-2}, or 0* = 02 n 

{{{pi,ai),{p 2 ,cr 2 )) ■ Pi > P 2 ,(yi > 0 - 2 }, or 0* = 02 n 

{((m,o-i), ip2,cr2)) : Pi = P2,cri > 02 ). 

The assertions above can be verified by noting that, by keeping one argument 
fixed, the Bellinger distance increase in the other argument in each direction 
as that moves away from the fixed value in terms of the Euclidean distance. 
In practice, the order restriction is often naturally obtained depending on the 
concrete problems. For instance, in brain oncology, a tumor often has higher 
intensity values than its surroundings in a positron emission tomography 
scan, while for astronomical applications objects of interest emit light and 
will be brighter. In this paper, we use the abstract condition (C) to provide 
a general framework for various relevant applications. 

Throughout this paper, we shall use /i(^, ^') to abbreviate 
/i(/(-; 1 ^),/(•; (/>')). The following general theorem gives a posterior 
contraction rate for parameters 0 and 7 . 

Theorem 3.1. Let a sequence Cn ^ 0 be such that ree^/logn is bounded 
awayfromO. Under Conditions (Al), (A2), (Bl), (B2), if there exists Borel 
measurable subsets C C“(S'^“^) with an = sup{|| 7 ||oo : 7 G such that 

(3.1) -logn(7:A(roAr)<e2)<ne2, 

(3.2) -logn(7GS)))>n62, 

(3.3) logiV(e2/(T^-\ II . Hoc) < nel, 

then for the entire parameter 6 = ((^, p, 7 ), we have that for every Mn —>■ 00 , 

(3.4) : dn{e,eo) > M„e„|xW,y(-)) ^ 0. 

Further, if also Condition (C) holds, then for the boundary 7 , we have that 
for every Mn —>• 00 , 

(3.5) pj;;)n (7 : A( 7 , 7 o) > TW) ^ 0 . 

Equation (3.5) claims that if the rate for 9 is e^, then the boundary 7 has 
the rate in terms of the discrepancy metric A(.,.) and can be faster 

than n “^/2 which is an interesting aspect of a boundary detection problem. 
The condition that ne'^/logn is bounded away from 0 is not restrictive 
since its sufficient condition > n~^ for some c < 1/2 is expected for 
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nonparametric problems. It ensures that the parametric components with 
priors of sub-polynomial tails are not influential for the posterior contraction 
rate compared to the nonparametric part. 

Remark 3.2. It follows immediately that ||Co ~ ■CII ^ and Upo ~ pII ^ 
en- These two parameters are not of our interest and are actually estimable 
at rate. To see this, we assume the separation Cq ,^ decays at a rate 

such that 0 (particularly co,n = cq satishes this), one may use 

a two-step semi-parametric procedure: first estimate the boundary curve 
consistently using Theorem 3.1, remove a small section of pixels neighboring 
the estimated boundary and then estimate (^, p) based on observations at 
the remaining pixels. The n“^/^-rate possibly also holds for the original 
posterior of (^, p) and will follow if a semiparametric Bernstein-von Mises 
theorem can be established using the rate in Theorem 3.1 as a preliminary 
rate; see [7]. 

In the next two subsections, we consider two general classes of priors 
suitable for applications of Theorem 3.1. 

3.2. Rate calculation using finite random series priors. The boundary 70 
is a function on which can be regarded also as a function on [ 0 , 1 ]'^“^ 

with periodicity restrictions. We construct a sequence of hnite dimensional 
approximation for functions in C“([0,1]'^“^) with periodicity restriction by 
linear combinations of the hrst J elements of a collection of fixed basis 
functions. Let $ = = (Ci; • • ■ be the vector formed by the first J 

basis functions, and 0 ^be a linear approximation to 70 with ||/3o jiloo < 
00 . We assume that the basis functions satisfy the following condition: 

(D). max ll^jlloo < for some constants t^fi^ > 0. 

i<i<^ 

Priors. We use a random series prior for 7 induced from 0^^ through the 
number of basis functions J and the corresponding coefficients (3 given J. 
For the simplicity of notations, we use (3 (/3q) for /3j {(3q j) when J is explicit 
from the context. Let 11 stand for the probability mass function (p.m.f.) of 
J, and also for the prior for [3. satisfying the following conditions: 

(El). - logn(J > j) > j logj, and - logn(J = j) < j log j. 

(E2). -logn(||/3-/3o||i < e|J) < Jlog(l/e), and n(/3 i [-M,M]^|J) < 
Jexp{—CM^} for some constant C. 

For instance, a Poisson prior on J and a multivariate normal for (3 meet the 
required conditions. 
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Finite random series priors with a random number of coefficients form a 
very tractable flexible class of priors offering alternative to Gaussian process 
priors. Their properties including asymptotic behavior of posterior distribu¬ 
tions have been thoroughly studied by [2] and [42], 

We derive conditions to obtain the posterior contraction rate as follows. 

Theorem 3.3. Let he a sequence such that e„ —)> 0 and ne^/logn is 
bounded away from 0, and Jn < n be a sequence such that J„ —)> oo. Under 
conditions (Al), (A2), (Bl), (B2), (C), (D), (El) and (E2), ifen,Jn satisfy 

IIto - Po,jM\oo < el/2, cinel < J^log < Jn log n < C 2 nel, 

for some constants ci > 0 and C 2 > 0, then the posterior contraction rate 
for 7 in terms of the distance A(-, •) is el/cl 

Remark 3.4. The optimal value of Jn, say Jn,a typically depends on 
the degree of smoothness a. We can use a fixed value J = Jn when a is 
given. The posterior distribution can be easily computed, for example, by a 
Metropolis-Hastings algorithm. If a is unknown, one will need to put a prior 
on J and reversible-jump MCMC may be needed for computation. 

Example 3.5 (Trigonometric polynomials). For the case d = 2 (2D 
image), we use trigonometric polynomials { 1 , cos 27rja;, sin 27rja;,... : co G 
[0,1]} as the basis. It is known if 70 is a-smooth, we have II 70 —/3 ^j^lloo < j " 
for some appropriate choice of Pqj [cf. 22]. Therefore according to Theo¬ 
rem 3.3, we can obtain the rate by equating Jt“ el and J„ log Jn nel, 
which gives the following rate Cn and the corresponding J„: 

Jn - ni/("+^)(logn)-^/(“+i\ el x n-“/(“+il(logn)“/("+i). 

Example 3.6 (Spherical harmonics). For 3D images {d = 3), periodic 
functions on the sphere can be expanded in the spherical harmonic basis 
functions. Spherical harmonics are eigenfunctions of the Laplacian on the 
sphere. It satishes condition (D) and more technical details and the analyti¬ 
cal expressions of spherical harmonics can be found in [43, Chapter 2], while 
MATLAB implementation is available in [14]. Let Kn be degree of the spher¬ 
ical harmonics, then the number of basis functions are Jn = K"/. The approx¬ 
imation error for spherical harmonics is Jn [11, Theorem 4.4.2]. Therefore 
we can obtain the posterior contraction rate by equating Jn and 

Jn log Jn ^ nel, which gives 

Jn X n2/("+2)(logn)-2/(“+2), el x 
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3.3. Rescaled squared exponential Gaussian process prior on S^. We use 
a rescaled squared exponential Gaussian process (GP) to induce priors on 
7 when d = 2. Specifically, let W be a GP with the squared exponential 
kernel function = exp(—||t — where t,t' G and || • || is the 

Euclidean distance. Let = (VLat,f G S^) be the scaled GP with scale 
a > 0, whose covariance kernel becomes Ka{t,t') = exp(—a^||t — The 

rescaling factor a acts as a smoothing parameter and allows us to control 
smoothness of a sample path from the prior distribution. 

When d = 2, it is natural to use the map Q : [0,1] —>■ —>■ 

(cos27ru;,sin27rw) as in [30], then by Lemma 4.1, the squared exponential 
kernel Ka{-, •) on has the equivalent RKHS as of the kernel G'a(ti,t 2 ) on 
[0,1] defined by 

Ga{ti,t 2 ) = exp(—a^{(cos27rfi — cos27rf2)^ + (sin27rfi — sin27rf2)^}) 

= exp{—4a^ sin^(7rfi — 7rf2)}. 

We call Gai',-) on the unit interval as squared exponential periodie (SEP) 
kernel. Theorem 3.7 gives the posterior contraction rate if a rescaled SEP 
Gaussian process is used as the prior. 

Theorem 3.7. Let Gonditions (Al), (A2), (Bl), (B2) and (G) hold. 

(i). Deterministie rescaling: If the smoothness level a is known, and we 
ehoose a = an = n^/*'“'’'^^(log then the posterior contraction rate 

in Theorem 3.1 is determined by = n““/(“'’'^)(logn)^"/("'’'^^. 

(a). Random resealing: If the rescaling factor a follows a gamma 
prior, then the contraction rate in Theorem 3.1 is determined by = 

j.^-a/(a+l)^^Qg ^^2a/(a+l) a > 0. 

Therefore, when the underlying smoothness level a is unknown, the SEP 
Gaussian process prior can adapt to a in a hierarchical Bayesian approach 
by assigning the rescaling parameter an appropriate prior such as a gamma 
distribution [46]. 

The proof to Theorem 3.7 relies on an extensive study of the correspond¬ 
ing RKHS of the rescaled SEP Gaussian process (see Section 4). We also 
obtain the eigen structure of the SEP Gaussian process analytically, leading 
to efficient MGMC method for posterior sampling for various distribution 
families (see Section 5). 

Remark 3.8. The rates obtained in Examples 3.5, 3.6 and Theorem 3.7 
are optimal in the minimax sense up to a logarithmic factor; see [26, Ghapter 
7]. By a uniform strengthening of Theorem 3 of [18], the conclusion can be 
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strengthened to uniform in (|o)Po,7o) £ ©o if on ©o Assumptions (Al) and 
(C) hold uniformly and ||7o||oo < Cq for a universal constant Cq > 0. 

3.4. Rescaled Gaussian process with the heat kernel. For a Gaussian pro¬ 
cess prior, various kernels may be used in addition to the squared exponen¬ 
tial kernel, for example, the heat kernel Gaussian processes studied by [8]. 
Without introducing the technical details on heat kernel theory, a Gaussian 
process with the heat kernel on can be represented as 

oo W(d) 

W^{x) = X; E 

k=0 1=1 

for any x G where are independent standard normal vari¬ 
ables indexed by {k,l), are the eigenvalues and orthonormal eigen¬ 

functions of spherical harmonics of order k whose dimension is Nk{d) = 
i2k + d- 2)/{d - and are the eigenvalues of the natural 

Laplacian on Starting from eigen decomposition of the heat kernel, 

the rescaling is applied directly to the eigenvalues A^. Note that this is a 
different rescaling strategy compared to the SEP Gaussian process where 
the rescaling is applied to the distances between two points. If we randomly 
rescale the process by letting T follow a gamma prior, then similar results as 
in Theorem 3.7 hold (possibly with a different logarithmic factor) based on 
the study of RKHS of VF^(-) available in [8] which are parallel to Lemma 4.4 
to Lemma 4.8, following the argument in proving Theorem 3.7. 

4. RKHS of SEP Gaussian processes. The RKHS of a GP plays a 
critical role in calculating the posterior contraction rate. There has been an 
extensive study of the RKHS of a GP indexed by [0,[e.g. 44, 46]. A GP 
indexed by can be naturally related to a GP indexed by [0, by a 
surjection Q : [0,—?• (for example, using the spherical coordinate 

system). Define the following kernels on [0,1]'^“^: G{si,S 2 ) = K{Qsi,Qs 2 ) 
for any si, S 2 G [0,1]'^“^. Let El be the RKHS of the GP defined by the kernel 
K, equipped with the inner product (•, •)h and the RKHS norm || • ||h. For 
the GP with covariance kernel G, we denote the RKHS, its inner product 
and norm by El', (•, •)e/ and || • Hh' respectively. Then the following lemma 
shows that the two RKHSs related by the map Q are isomorphic. 

Lemma 4.1. (El', || • Hh') and (El, || • ||h) are isometric; the conelusion also 

holds when we use the || • ||oo norm. 

However, if K is the squared exponential kernel on the kernel G(-, •) 
is no longer a squared exponential kernel on [0, More importantly, it 
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is not even stationary for general d > 2. The case h = 2 is an exception, for 
which the RKHS can be studied via an explicit treatment such as analytical 
eigen decompositions of its equivalent kernel on the unit interval. We next 
focus on the case d = 2, and study the RKHS of a GP t G [0,1]} 

with the SEP kernel Ga{-, •) which was used in Section 3.3. 

The SEP kernel Ga{-, •) is stationary since Ga{ti,t 2 ) = — ^ 2 ), where 

4’ai't) = exp{—4a^ sin^(7rt)}. The following result gives the explicit form of 
the spectral measure /Tq of the process VP“. Let 6 x be the Kronecker delta 
function and In{x) be the modified Bessel function of the first kind with 
order n and argument x where n G Z and x G M. 

Lemma 4.2. We have where is a symmetric 

and finite measure and given by fVa = Yl'^=-oo In{‘^o?) 62 Tm- 

In addition, the Karhunen-Loeve expansion of the covarianee kernel is 
Ga{t,t') = Ufc(a)'0fc(t)V’A:(iO) where the eigenvalues are given by 

vi{a) = e“^“^Io(2a^), V 2 j{a) = V 2 j+i{a) = e~‘^°‘^Ij{2a^), j > 1, 

with eigenfunctions fij{t), j = 1,2,... given by the Fourier basis functions 
{1, cos 27rt, sin 2 'Kt ,...} in that order. 

The measure pLa is the so-called spectral measure of W°‘. Existing litera¬ 
ture [e.g. 44, 46] studied convergence properties of rescaled GP on [0,1]'^“^ 
relying on the absolute continuity of the spectral measure and the scaling 
relation pia{B) = pi{aB). However, Lemma 4.2 shows that the spectral 
measure of a SEP Gaussian process is discrete and the simple relationship 
PaiB) = fii{aB) does not hold any more. We instead heavily use properties 
of modified Bessel functions to study the RKHS of a SEP Gaussian process. 

Note that the discrete measure pa has subexponential tails since 

/ oo 00 

^ e2-l’"le-2“'4(2a2) < 2e-2“' ^ e2^”4(2a2) 

n=—oo n=0 

which is bounded by ^ (Proposition 8.1 (a)). The fol¬ 

lowing Lemma 4.3 describes the RKHS ]HI“ of the rescaled process as 
real parts of a closed set containing complex valued functions. 

Lemma 4.3. The RKHS ]HI“ of the proeess is the set of real parts of 
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all functions in 

oo 

{h : [0,1] ^ C, h{t) = Y. 


bn,a G C, ^ \bn,a\‘^e In{2a^) < Oo}, 

n=—oo 

and it is equipped with the squared norm 


Y ^“"/„(2a2) = Y 


f- e-2-^/„(2a2) 


h{ty^'^^^dt 


We then consider the approximation property of EI“ to an arbitrary 
smooth function w G C“[0,1]. Unlike the approach approximating w by 
a convolution of wq with a smooth function as used in [44, 46], we use a 
finite Fourier approximation to w. 


Lemma 4.4. For any function w G C"[0,1], there exists constants Cw 
and Dm depending only on w such that inf{||/i||^a : jjtc — h\\oo < < 

D^a^ as a —^ oo. 


Lemma 4.5 obtains an entropy estimate using Proposition 8.3 on modified 
Bessel functions. 


Lemma 4.5. Let B[“ he the unit ball of the RHKS of the process hF“ = 
(TUf : 0 < t < 1), then we have log AI(e,]H[f, || • jjoo) < max(a, 1) •{log(l/e)}^ . 

As a corollary of Lemma 4.5, using the connection between the entropy 
of the unit ball of the RKHS and the small ball probability [27, 28], we have 
the following estimate of the small ball probability. 

Lemma 4.6 (Lemma 4.6 in [46]). For any oq > 0, there exits constants 
C and cq that depend only on oq such that, for a > oq and e < cq, 

— logP ( sup 1W“1 < e ] < Ca ("log . 

Vo<i<i / V e/ 

The proof of Theorem 3.7(ii) needs a nesting property of the RKHS of 
VF“ for different values of a. Lemma 4.7 in [46] proved that C VbM.\ 

if o < 6 for a squared exponential GP indexed by [0,1]*^“^. For the SEP 
Gaussian process prior, this does not hold but can be modified up to a 
global constant. 
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Lemma 4.7. If a < b, then ^/aWl c for a universal constant c. 

When a 4 , 0, sample paths of tend to concentrate to a constant value 
by the following lemma. This property is crucial in controlling the variation 
of sample paths for small a. 

Lemma 4.8. Fork G ]HI“, we have |/i(0)| < 1 and — /i(0)| < 2^/2'Kat 
for every t G [0,1]. 

5. Sampling Algorithms. We assume that the origin (center of the 
image) is inside the boundary, and thus use it as the reference point to 
represent the observed image in a polar coordinate system as {u,r]Y), 
where (u:,r) = are the locations using polar coordinates and 

Y = {Y}2=i are the image intensities. Let 7 be a closed curve, and 7 be 
values of 7 evaluated at each to. 

For most kernels, the eigenfunctions and eigenvalues are challenging to 
obtain although there are several exceptions [40, Ch. 4.3]. Therefore a ran¬ 
domly rescaled GP prior may be infeasible in practice since the numerical 
inversion of a covariance matrix is often needed when no analytical forms are 
available. However, thanks to the analytical eigen decomposition the SEP 
kernel in Lemma 4.2, we can implement this theoretically appealing prior 
in an computationally efficient way. If a curve 7 ( 0 ;) r-u GP(Mu;),G,(-,-)/r), 
we then have the equivalent representation 7 ( 0 ;) = /^(w) + 
where ~ N{0,Vk{a)/ t) independently. 

The modified Bessel function of the first kind used in Ufc(a)’s is a li¬ 
brary function in most software, such as bessell in R language. Figure 1(a) 
shows that eigenvalues decay very fast when a = 1,10. When a increases, 
the smoothness level of the kernel decreases. In practice we typically do 
not use values as large as 100 since then the kernel becomes very close 
to the identity matrix and thus the resulting prior path becomes very 
rough. The fast decay rate of Vk{a) for fixed a (Proposition 8.1 (c3)) guar¬ 
antees that some suitable finite order truncation to the Karhunen-Loeve 
expansion is able to approximate the kernel function well. Suppose we 
use L = 2J + 1 basis functions, then the truncated process is given by 
7 (‘^) = Efc=i + h{^)- Let PVEa = ^fc(a)/ Y!k=i ^k{a) be the 

percentage of variance explained by the first L basis functions, where the 
denominator Vk{a) = J2T=-oo Lfc( 2 o^) = 1 according to the defi¬ 

nition of Vk in Lemma 4.2 and the properties of the modified Bessel function 
of the first kind in Proposition 8.1. Figure 1(b) shows that with J = 10, we 
are able to explain at least 98% of all the variability for a reasonable range 
of a’s from 0 to 10 . 
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(a) (b) 

Fig 1: Decay rate of the eigenvalues of the squared exponential kernel. Figure 

(a) plots the values of U 2 j+i(o) at J = 0,..., 20 when a = 1,10,100. Figure 

(b) plots the percentage of the variance explained (i.e. PVEq) if using the 
first 21 (J = 10) basis functions at different values of a. 


Let 'I' be the n by L matrix with the kth. column comprising of the evalu¬ 
ations of V’fc(') at the components of u, and comprising of the evaluations 
of /i(-) at the components of u. Then the Gaussian process prior for the 
boundary curve can be expressed as 

7 = T' 2 ;-b/x; z ~ A^(0, Sla/r) 

where Sa = diag(ui(a),..., VL{a)). We use the following priors for the hyper¬ 
parameters involved in the covariance kernel: r ~ Gamma(500,1) and a ~ 
Gamma(2,1). For the mean //(•); we use a constant 0.1. Note that here we 
also can use empirical Bayes to estimate the prior mean by any ordinary one 
dimensional change-point method and an extra step of smoothing. However, 
our numerical investigation shows that our method is robust in terms of the 
specification of //(•). 

The priors for (^, p) depend on the error distributions. We also need to 
use the order information between the parameters to keep the two regions 
distinguishable. We use OIB, OIN, OIG for ordered independent beta, nor¬ 
mal and gamma distributions respectively. If not specified explicitly, the 
parameters are assumed to be in a decreasing order. It is easy to see that 
this convention is for simplicity of notations, and any order between the 
two region parameters are allowed in practice. Below we give the conjugate 
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priors for (^, p) for some commonly used noise distributions: 

• Binary images: the parameters are the probabilities ( 7 ri, 7 r 2 ) ~ 

• Gaussian noise: the parameters are the mean and standard deviation 
(//i, (Ti, ^ 2 , T 2 ) with the priors to be {pi, P 2 ) r-u OIN(//o,T^,^o,o-g) and 

~ OIG(a2,/32,a2,/32)- 

• Poisson noise: the parameters are the rates (Ai,A 2 ) ~ 

OIG(q;3, /^s, as, Ps); 

• Exponential noise: the parameters are the rates (Ai,A 2 ) ~ 

OIG(a 4 , P 4 , a 4 , P 4 ). 

In fact, any error distributions with conjugacy properties conditionally on 
the boundary can be directly used. We specify the hyper-parameters such 
that the corresponding prior distributions are spread out. For example, in the 
simulation, we use ai = /3i = 0 for binary images; we use po = y,ao = 10 ^ 
and a 2 = P 2 = 10“^ for Gaussian noise. 

We use the slice sampling technique [33] within the Gibbs sampler to draw 
samples from the posterior distribution for (z, p,T, a). Below is a detailed 
description of the sampling algorithms for binary images. 

1. Initialize the parameters to be 2 ; = 0,t = 500 and a = 1. The pa¬ 
rameters p) = (vri, 7 r 2 ) are initialized by the maximum likelihood 
estimates (MLE) given the boundary to be p{-). 

2 . 2 :|( 7 ri, 7 r 2 ,r,a, 1 ^) : the conditional posterior density of 2 ; (in a loga¬ 
rithmic scale and up to an additive constant) is equal to 

^log/(Fi;7ri) -b ^log/(yi;7r2) 
iGli iE /2 

7ri(I - 7r2) I 

= log —--- -b ni log - 

7r2(l-7ri) I 

where ni = Yli < 7i) Ni = Il(rj < 7^)7). We use slice 
sampling one-coordinate-at-a-time for this step. 

3. r| {z, a) ~ Gamma(a*, b*), where a* = a+L/2 and b* = b+z'^Yl~^z/2\ 

4. {■Ki,TT 2 )\{z, Y) ~ OIB(ai -b Ni,Pi + ni - Ni, ai -b N 2 ,Pi + n 2 - N 2 ), 
where N 2 is the count of I’s outside 7 and 712 is the number of obser¬ 
vations outside 7 . 

5. ajz, r: use slice sampling by noting that the conditional posterior den¬ 
sity of a (in a logarithmic scale and up to an additive constant) is 


tz'^Y-^z 


— TTi 


vr2 
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equal to 

logical _a 

2 2 e° 


E 

k=l 


log Vkia) 


-E 


TZt 


^ 2vk{a[ 


-+log a—a. 


The above algorithm is generic beyond binary images. For other noise 
distributions, the update of r and a are the same. The update of 2 and 
(^, p) in Step 2 and Step 4 will be changed using the corresponding priors and 
conjugacy properties. For example, for Gaussian noise, the parameters (^, p) 
are /U 2 , £ 12 ), and the conditional posterior density (in the logarithmic 

scale and up to an additive constant) used in Step 2 is changed to 


- ni(log(Ti - log 0 - 2 ) 


E 

ie/i 


iVi - 

2 af 


E 

ie/2 


iVi - 

2 aj 


TZ'^'S 


-1 

a 


Z 


2 


For Step 4, the conjugacy is changed to 

{pi,P2)\{z,ai,a2,Y) ~ OIN, cj^^)|(z,/ ri,// 2 , T") ~ OIG. 


Similarly, it is straightforward to apply this algorithm to images with Pois¬ 
son noise, exponential noise or other families of distributions with ordered 
conjugate prior. 


6. Simulations. 


6.1. Numerical results for binary images. We use jitteredly random de¬ 
sign for locations {aj,r) and three cases for boundary curves: 

• Case Bl. Ellipse given by r{uj) = 6162/\/(&2 cos -|- (61 sina;)^, where 
61 > 62 and u is the angular coordinate measured from the major axis. 
We set bl = 0.35 and 62 = 0.25. 

• Case B2. Ellipse with shift and rotation: centered at (0.1, 0.1) and 
rotated by 60° counterclockwise. We use this setting to investigate the 
influence of the specification of the reference point. 

• Case B3. Regular triangle centered at the origin with the height to be 
0.5. We use this setting to investigate the performance of our method 
when the true boundary is not smooth at some points. 

We keep using 7 r 2 = 0.2 and vary the values of tti to be (0.5, 0.25). For each 
combination of ( 7 ri, 7 r 2 ), the observed image is m x m where m = 100,500 
(therefore the total number of observations is n = m^). The MCMC proce¬ 
dure is iterated 5,000 times after 1,000 steps burn-in period. For the esti¬ 
mates, we calculate the Lebesgue error (area of mismatched regions) between 
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the estimates and the true boundary. For the proposed Bayesian approach, 
we use the posterior mean as the estimate and construct a variable-width 
uniform credible band. Specifically, let be the posterior sam¬ 

ples and ( 7 ( 0 ;),?(u;)) be the posterior mean and standard deviation func¬ 
tions derived from { 7 j(a;)}. For each MCMC run, we calculate the distance 
Ui = ||( 7 i — 7 )/'S||oo = sup,^{| 7 i(a;) — 7 (cu)|/S"(a;)} and obtain the 95th per¬ 
centile of all the Uj’s, denoted as Lq. Then a 95% uniform credible band is 
given by [ 7 ( 0 ;) - Los{u}),^{uj) Los{uj)]. 

We compare the proposed approach with a maximum contrast estimator 
(MCE) which first detect boundary pixels followed by a post-smoothing via 
a penalized Fourier regression. In the 1-dimensional case, the MCE selects 
the location which maximizes the differences of the parameter estimates at 
the two sides, which is similar to many pixel boundary detection algorithms 
discussed in [35]. In images, for a selected number of angles (say 1000 equal¬ 
spaced angles from 0 to 27r), we choose the neighboring bands around each 
angle and apply MCE to obtain the estimated radius and then smooth those 
estimates via a penalized Eourier regression. Note that unlike the proposed 
Bayesian approach, a joint confidence band is not conveniently obtainable 
for the method of MCE, due to its usage of a two-step procedure. 

As indicated by Table 1, the proposed Bayesian method has Lebesgue er¬ 
rors typically less than 2.5%. In addition, the proposed method outperforms 
the benchmark method MCE significantly. We also observe that the MCE 
method is highly affected by the number of basis functions; in contrast, the 
proposed method adapts to the smoothness level automatically. The com¬ 
parison between Case B1 and Case B2 shows that the specification of the 
reference point will not influence the performance of our methods since the 
differences are not significant compared to the standard error. 


Table 1 

Lebesgue errors (xl0~^) of the methods based on 100 simulations. The standard errors 
are reported below in the parentheses. 



m = 100, 

(7ri,7r2) = 

(0.50,0.20) 

m = 100, 

(7ri,7r2) = 

(0.25,0.20) 


Case B1 

Case B2 

Case B3 

Case B1 

Case B2 

Case B3 

Bayesian method 

0.64 

0.67 

2.26 

0.71 

0.8 

2.36 


(0.02) 

(0.02) 

(0.02) 

(0.03) 

(0.03) 

(0.03) 

MCE with 5 bases 

6.57 

6.58 

6.03 

6.39 

10.09 

7.03 


(0.25) 

(0.21) 

(0.07) 

(0.19) 

(0.20) 

(0.11) 

MCE with 31 bases 

8.75 

7.84 

5.96 

9.19 

11.8 

7.86 


(0.18) 

(0.19) 

(0.10) 

(0.16) 

(0.19) 

(0.14) 


Eigures 2 and 3 confirm the superior performance of the proposed method 
compared with the smoothed MCE method with 5 and 31 basis functions 
when the true boundary curve is an ellipse, an ellipse with shift and rotation 








20 


LI, M. AND GHOSAL, S. 


and a triangle. Even in the case of tti = 0.25 where the contrast at two sides 
of the boundary is small, the proposed method is still able to capture the 
boundary when m = 500. This observation is consistent with the result 
derived from the infill asymptotics when the number of data points increase 
within images of fixed size. In addition, we also obtain joint credible bands 
using the samples drawn from the joint posterior distribution. Figure 4 plots 
the trace plots of (a,'7ri,7r2) and the histogram of a to illustrate the mixing 
and convergence of the posterior samples for Case B1 when m = 500 and 
the true parameters (7ri,7r2) = (0.25,0.20). Similar plots are obtained for 
other scenarios but are not presented here. 

6.2. Numerical results for Gaussian noised images. For Gaussian noised 
images, we keep using an ellipse with shift and rotation as the true boundary 
curve (i.e. Case B2). We consider the following four scenarios where the two 
standard deviations are all given by (ai,a 2 ) = (1.5,1) and the observed 
image is 100 x 100: 

• Case Gl. fii = A, ^2 = 1, he. the two regions differ in both the first 
two moments; 

• Case G2. g-i = ^2 = 1, he. the two regions only differ in the standard 
deviation; 

• Case C3. (^i,/i 2 ) are functions of the location. Let be the small¬ 
est radius inside the boundary, and r^ the largest radius outside the 
boundary. We use /i(z) for the mean of Yi and let /i(i) = ri — r^ + 0.2 if 
it is inside, while /i(i) = ri + r^ if outside. Therefore, the mean values 
vary at each location but have a gap of 0.2 between the two regions. 

• Case G4. We use mixture normal distribution 0.6N(2, u^)-|-0.4N(1, al) 
for the inside distribution; the outside distribution is still Gaussian 
with mean ^2 = 1- 

Cases G3 and G4 allow us to investigate the performance of the proposed 
method when the distribution /(•) in the model is misspecified. For com¬ 
parison, we use a 1-dimensional change-point detection algorithm [9, 25] via 
the R package changepoint [24]. For the post-smoothing step, we use a pe¬ 
nalized Fourier regression with 5 and 31 basis functions (method CP5 and 
CP31 in Table 2). Here we use the estimates of CP5 as the mean in the 
Gaussian process prior. Table 2 shows that the proposed method has good 
performance for all the four cases. The method of CP5 and CPIO produce 
small errors in Case Gl, but suffer a lot from the other three cases. It shows 
that the change-point method highly depends on the distinction between 
the means (Case G2), and also it loses its way when the model is misspec¬ 
ified. In fact, for Cases G2, G3 and G4, the CP5 and CP31 methods lead 
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100, TTi = 0.50 



(e) Case Bl: m = 500, 
TTi = 0.25 



(i) Case B2: m — 100, 
TTl = 0.50 



(b) Bayesian Est. 



(f) Bayesian Est. 



(j) Bayesian Est. 



(k) MCE (5 basis) 



(d) MCE (31 basis) 



(h) MCE (31 basis) 



(1) MCE (31 basis) 



Fig 2: Performance on binary images (Column 1) with elliptic boundary. 
Column 2-4 plot the estimate (solid line in red) against the true boundary 
(dotted line in black). A 95% uniform credible band (in gray) is provided 
for the Bayesian estimate (Column 2). 
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D> 

(a) Case B3: m = 100, (b) Bayesian Est. 

TTl = 0.50 

o 

(e) Case B3: m = 500, (f) Bayesian Est. 

TTl = 0.25 



(c) MCE (5 basis) (d) MCE (31 basis) 



(g) MCE (5 basis) (h) MCE (31 basis) 




Fig 3: Performance on binary images (Column 1) when the boundary curve 
is an regular triangle. Column 2-4 plot the estimate (solid line in red) against 
the true boundary (dotted line in black). A 95% uniform credible band (in 
gray) is provided for the Bayesian estimate (Column 2). 
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to a curve almost containing the whole frame of the image. The proposed 
Bayesian approach which models the boundary directly, seems to be not af¬ 
fected even when the model is substantially misspecified (Case G3). Figure 5 
shows the noisy observation and our estimation from 1 replication for all the 
four cases. We can see the impressive performance of the proposed method. 
It also shows that the contrast between the two regions are visible for Cases 
G3 and G4, and the proposed method is capable to capture the boundary 
even though the distributions are misspecified. 

Table 2 

Performance of the methods for Gaussian noised images based on 100 simulations. The 
Lebesgue error (xl0~^) between the estimated boundary the true boundary is presented. 

The maximum standard errors of each column are reported in the last row. 



Case Cl 

Case G2 

Case G3 

Case G4 

Bayesian Method 

0.11 

0.99 

0.69 

0.99 

CPS 

2.90 

62.91 

62.2 

61.12 

CP31 

1.99 

64.00 

63.26 

62.10 

SE 

0.01 

0.26 

0.19 

0.27 








(a) Case G1 


(b) Case G2 


(c) Case G3 


(d) Case G4 



(e) Case G1 


(f) Case G2 


(g) Case G3 


(h) Case G4 




Fig 5: Proposed Bayesian estimates for Gaussian noised images with elliptic 
boundary. Plots (a)-(d) are the noisy observations. Figures (e)-(h) are the 
corresponding estimates (solid line in red) against the true boundary (dotted 
line in black), with a 95% uniform credible band (in gray). 
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7. Proofs. 

Proof of Theorem 3.1. Step 1: Prior concentration. Let 

f n n 'I 

BliOo, e) = < 0 ^ “ E h 

n n 

\ i=\ i=\ ) 

where = K{Pe,,uPe,i) and ¥^{ 60 , 6 ) = V{Pe,,uPe,i). When ||C - 

^oll < and ||/0 — / 0 o|| < for some small e, it follows that 

K.iOo, e) = K{io, i)P{Xi G To n P) + iL(po, p)P{Xi G pg n r=) 

+ iL(^o, p)P{Xi G Po n r^) + iL(po, OP{Xi G rg n P) 

< 11^0 - + IIpo - pf + P{Xi G PS n P) + P(W G Po n P'^) 

(7.1) = 116 - + IIpo - pf + nA[(Po A P) n P], 

according to the Assumption (A). Consequently, the average Kullback- 
Leibler divergence 

- E ^*(^0, e) < 116 - + IIpo - pf + -nA[(Po A P) n (UTi)] 

(7.2) ^ 

= lieo-ef+ l 6 o-pf+ A(PoAP). 

Similarly, the second moment P) of the log-likelihood ratio is also bounded 
in the same way, i.e. Vi{9Q,6) < ||6 - ^f + IIpo - pf + A(Po A P), which 
leads to 

B:{eo,e) D {(6/9,7) : Il 6 -^f < efs, \\po-pf < e^/S, A(Po AP) < efs}. 

Therefore, we have 11(5* ( 6 , e)) > n (||6 - ^f < e^/3, ||/ 0 o - pf < e^/3) x 
n(A(Po A P) < e^/3), or equivalently, 

- iogn( 5 ;( 6 , 66 ) < - iogn (||6 - < eV3 ,160 - pf < 6^3) 

-logn( 7 : A(Po AP) <eV3). 

By Assumption (Bl), the prior density of (6 p) is bounded below in a neigh¬ 
borhood of (6, po), indicating that n(||6 - < e^/3, ||po - pf < e^/3) > 

e^P and thus -logn(||6 - ^f < eV3, \\po - pf < eV3) < log(l/e 6 . 

Let Cn be a sequence such that Cn —)■ 0 and ne^/logn bounded away 
from 0, we then have log(l/e 6 6 logu 6 Then in order to ensure that 
-logn( 566 ,en)) < nel, it suffices that - log 11(7 : A(Po AP) < el) < nel 
in equation (3.1). 
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Step 2: Sieves. For each prior, we shall define a sieve for 7 , and consider 
©n = [—Cn,Cn]^ X [—Cn,c„]^ X as the sieve for 0. Because 

n( 0 ^) < n(e : e ^ [-c„, Cn]n + ^{p-P^ l-Cn, CnV) + n(7 : 7 ^ so, 


in order to ensure that the sieve contains most of the prior mass, it is 
sufficient to show — logn(SO 0 in equation (3.2) provided that 

-logn(^ : ^ ^ [-Cn,Cn]P) > nel and -logn(p : p ^ [-c„,c„]p) > ne^ For 
the later two conditions, we let Cn = Then — logn(^ : ^ ^ [—Cn, Cn]^) 0 
— logc“*2 by Assumption (B2), which is t 2 ■ ne^ > ne^ similarly, we have 
-logn(^ : ^ i [-Cn,Cn]P) > nel- 

Step 3: Entropy bounds. Let an = sup II 7 II 00 , for 7 , 7 ' G S„, we then have 

7 eSn 


A( 7 , 7 ') = 


/• 7 '(w) 

/ 


r-d-ldr 


du < cr, 


d-li 




Like in equation (7.2), the average squared Bellinger distance d^ has the 
following bound when ||^ — Oil < e and ||/0 — OH < e for some small e and 
max(|| 0 |,||OII,M, lip'll) <M: 


17 

dl{9,e') = -Y. h\^i,^[)dPx,<h\i,i') + h\p,p') + X{TAT') 

^ i=l 

< (1 + M'-Odie - y f + Up - pf) + - t'IIoo 


by Condition (A2). Therefore the entropy log Al(e„, 0„, d^i) is bounded by 
21 ogfV(e 2 /(l + c„)'’o,[-c„,c„f,|| . ||)+logfV(e2/^7^^S,,|| • lU) 

< log^ +logA^(e^/o-^"0s„, II • Iloo). 

Recall that Cn = Therefore log(cn/e^) < ne^+ log(l/e^) < ne^ Hence, 
in order to ensure log A'(en, 0 „, d^) < ne^, it is sufficient to verify that 
log A^(e^/fj^“0 S„, II • Iloo) < ne^ which is equation (3.3). 

Then equation (3.4) follows by applying Theorem 4 of [18]. 

Equation (3.5) will follow if we show that dn{0,0o) < f-n implies A(r A 
Fq) < ^nl^n- argued in the derivation of (7.2), dn{9,9') is given by 

^E/ h\cp,4>')dPx, = h\^o,m^oriT) + h\po,p)x{nrin 

i 

+ h\^o, p)A(Fo n r^) + h^po, e)A(re n f). 




BAYESIAN DETECTION OF IMAGE BOUNDARIES 


27 


The above expression is larger than each of the following three expressions: 

OA(ro n r) + h^{po,ox{n n r) 

> +^h{po,0) . ^ ^^pc p p^)^ 

h\po,p)Xin n r'^) + p)A(ro n r) 

> ihiP0,P)+^K^0,p)? _ ^^^pc ^ pc) ^(p^ ^ pc)^^ 

/i2(^o, p)A(ro n r) + h^{po,OKn n r) 

^ {H^0,p)+Jl{P0,O? . (^^p^ ^ pc) ^(pc ^ p))_ 

We further have /i(^o,0 + Hpo, 0 > h{^o,Po) and h{^o,p) + h{po,p) > 
^(^o,Po), by the triangle inequality, and h{^o,p) + h{po,0 > co,n > 0 by 
Condition (C). Combining with the last three displays respectively, we ob¬ 
tain 

(7.3) A(ronr)AA(rsnr)<e2/cg,„, 

(7.4) A(rgnnAA(ronr-)<e2/c2^„, 

(7.5) A(ronnAA(rsnr)<e2/cg,„, 

whenever d'^{6o,9) < e^. By adding (7.3) and (7.4) to (7.5), we derive 

(7.6) A(ro) A A(rg n r) < el/cl^, A(rs) a A(ro n r) < el/cl^. 

Since To is fixed with A(ro) > 0 and A(rg) > 0 by the assumption, (7.6) 
implies that A(rQ n T) < e^/cQ ,j, and A(ro n T'^) < c^/cq Consequently 
A(roAr) = A(rQnr) + A(ronr‘^) <e^/cQ ,j, which completes the proof. □ 

Proof of Theorem 3.3. We verify equations (3.1), (3.2) and (3.3) in 
Theorem 3.1. Since ||7o — f3o j^$\\oo < we have 

n{7 : 7 = /3^^, ||7 - 7o||oo < e^} 

> n(J = JnM\\P^^ - / 3 o ^lloo < el/2\J = Jn) 

> n(J = JnM\\P - PoWl < t^^J-'^el/2\J = Jn), 

where the last step follows because WfSj— /Sjj^Hoo < \\Pi,j ~ 
(32 j\\i ™ax ||A||oo < j ~ (32 j\\i according to the triangle inequal- 

ity and Assumption (D). Therefore, we prove equation (3.1) by noting that 
-iogn {7 = : II 7 - 70 II 00 < 4} < -iogn(j = j„)-logn( 11/3-/3o111 < 
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= Jn) < JnlogJn + Jn log( Jn/e„) < Jn log Jn + Jn log n < 

nel- 

Considering the sieve = {7 : 7 = G < Jn, ||/3||oo < 

y^njC}, the estimate of the prior mass of the complement of the sieve is 
given by 11(7 : 7 ^ Sn) < n(J > Jn) + Jne"” (see equation (2.10) in [42]). 
For any a,b> 0, we have log(a + 6 ) < log(2(aV6)), leading to — log(a + 5) > 
-log2 + (-logo) A (-log 6 ). Noting that - logn(J > Jn) > Jnlog Jn, and 
- log( Jne“"-) = n - log Jn > n - log n X n > Jn log Jn (because Jn log Jn < 
nel ^ n), we then obtain - log 11(7 : 7 ^ Sn) > Jnlog Jn > nel verifying 
equation (3.2). 

For the entropy calculation, we first notice that for any 7 G Sn, we have 
hiloo = P'^^lloo < max ||^j||oo||/3||i < J*'‘V^< which is an upper 

bound for Un- We estimate the packing number D{el/al~^, ^In, || • ||oo )by 


Diellat \{/3 e ||/3||oo < VWC}, || • ||oo) < 
i=i i=i 

which is further bounded by Jn(l+-\/n7C(Tn“^/en)‘^". Equation (3.3) follows 
since log lV(e2/cr^-i, Sn, || • ||oo) < logF>(e2/o-^-i, Sn, || • ||oo) < logJn + 
Jn log n + Jnlog(l/e2) < Jnlog n < ne^. □ 




Proof of Theorem 3.7. We first obtain the contraction rate for deter¬ 
ministic rescaling when the smoothness level a is known. 

Let B be C“(S^) equipped with the || • jjoo norm. Let <?i>“Q(e) = 

< e) stand for the concentration func- 

7eII“:||7-7o||oo<e 

tion at 70 . Note that i?io(e) = ~ logP(||lF“||oo < e). 

The selection of sieves and entropy calculation is similar to Theorem 2.1 
in [45] with Cn replaced by el and adjustment because of the involvement of 
cJn later on. Define the sieve as Sn = (MnlHf-|- ^enMT^Bi), where Hf and Bi 
are the unit balls of EI“ and B respectively. Let M„ = —2‘h“^(exp(—Cne^)) 
for a large constant C > 1. Then by Borell’s inequality, we can bound 
n(S^) < n(7 0 MrMl+elMi) < exp(-Cne 2 ), provided that (l)l{\elM-^) < 
nel- 


For the entropy calculation, hrst observe that < nel since 

4’“^(u)| < y /2 log(l/u) for u G (0,1/2). We further notice that 
C Bi since by Lemma 4.3, for any function h G Hf, we have 

2„-2a^ 


-00 \bn,a\^e-^^ Ini 2 a^) ' 

1 (the last step uses Proposition 8.1 (a) 
by letting z = 1 and x = a^). Therefore, the sieve is a subset of 


2 

00 

00 

jn- 


< {E“-oolV 
E“_ooe-^“'ln(2a2) = 


li[“ 
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{Mn + e^M^ V4)Bi and thus Un = sup{|| 7 ||oo : 7 ^ < M„ + e^M„ < 

2Mn for sufficiently large n. By construction of a |e^M“^-net for ]HI“ is 
a ^e|M“^/2-net for Therefore by Lemma 4.5, we have 

/ /If 2 \ ^ 

<a(^fog^J <«(fogn) 2 . 

To evaluate the prior concentration probability, we proceed as follows. 
Let n“(-) be a SEP Gaussian process with the rescaling factor a. By the 
approximation property of BI" in Lemma 4.4, there exists ho G EI“ such that 
\\ho - 7o||oo < and ||/io||Ha < a. Therefore, if a“" < el/2, then 

n“(7 : ||7 - 7o||oo < el) > n“(7 : ||7 - /io||oo < 4/2) > exp{-(/>[|(e2/4)}, 

leading to -logn “(7 : ||7 - 7 o||oo < el) < (/^{el/i). 

Note that </>o(e) ^ ei{log{a/e))'^ (Lemma 4.6). To satisfy the conditions in 
Theorem 3.1, we choose a = an depending on the sample size such that el x 
a““, and an(logn)^ x nel- Then the posterior contraction rate is obtained 
as el = n““/fo+^)(logn)“^“/fo'''^\ with an = n^/fo+^^(logn)“^/fo+^). 

Now consider the random rescaling when the smoothness a is un¬ 
known. The established properties of the RKHS of from Lemma 4.3 
to Lemma 4.8 are parallel to the case when a GP is indexed by [0,1]'^“^ 
with a stationary kernel, therefore, we can directly follow the argument in 
the proof of Theorem 3.1 in [46]. There is need for a slight modification 
since the nesting property given in Lemma 4.7 has a universal constant c, 
but this does not affect the asymptotic rate. The posterior contraction rate 
el is thus obtained. □ 

Proof of Lemma 4.1. For any /', G El', there is a series of a/s such 
that fgi{-) = '^aiK{s', •), and there exists s G [0,1]'^“^ such that s' = Qs. 
Let fs = £ El- Therefore, the map 0 : El —)■ EI',(/>/s = fq^ is 

surjective. If there exists another S 2 G [0,1]'^“^ such that s' = Qs 2 and 
fs 2 = •) G El, then fs^ = fs because fs = = 

Y/eiiiK{Qs 2 ,Q-) = fs 2 - Therefore, the map f is bijective. In addition, the 
definition of G{-,-) : G{si,S 2 ) = K{Qsi,Qs 2 ) also implies that the map (p 
is distance preserving when using || • ||e and || • ||h' as the norms. Therefore 
(p extends to an isometric isomorphism between El and El'. The map p also 
preserves the distance if we use the || • ||oo norm. □ 
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Proof of Lemma 4.2. The generating function of In{‘^x) is given in 
Proposition 8.1 (a): = Yl'?^=-oo -^n(2x)z"' for z G C and z ^ 0. Let 

X = a?' and 2 ; = Noting that z + 1/z — 2 = —4sin^(7rt), we then have 

= exp{-4a2sin2(7rt)} = By defining /Tq 

as the discrete measure in Lemma 4.2, we obtain that 4>ait) = f e~'^*^d/j,a(s). 

Furthermore, Ga{t,t') = (f)a{t - t') = Xl^-oo In 

view of the orthonormality of : n G Z}, we obtain that the integral 

iL(t, for any n G Z. Hence the trigono¬ 

metric polynomials {1, cos 2n7rt, sin2n7rt : n > 1} form the eigenfunction 
basis of the covariance kernel Ga{-, •)) where the corresponding eigenvalues 
are {e“^“^/n(2a^),e“^“^/„(2a^) : n > 0}. □ 

Proof of Lemma 4.3. Since the measure has subexponential tail. 
Lemma 2.1 in [44] is directly applicable. Using the discrete measure 
defined in Lemma 4.2, the proof follows. □ 


Proof of Lemma 4.4. By Jackson’s theorem [22], for any function w G 
C“[0,1] and any N, there exists a complex-valued sequence (s„) such that 
||r^ - halloo < Au,N~°‘, where h*p^{x) = is a 

constant depending only on w. Let hw be the real part of h*j^ , then clearly 
this /iTv G IHI“ and by Lemma 4.3 its square norm is 




N 

E 

n=-N 


3-2a2 


/n(2a2 




Using the monotonicity of In{2a^) for fixed a (Proposition 8.1 (b) and (c2)), 
we have e“^“^/„(2a^) > e“^“^lAr(2a^) for n = 0, ±1,..., ±A^. Therefore, 


N 


IlhArllm < 


e-^^^lN{2a?) t 


E 


n=-N 


hN{t)e 


it2nn 


dt 


< 


e-2“"/^(2a2) 


llhwll 


Let N a, then and ^ a~^ according to Propo¬ 

sition 8.2. On the other hand, Hh-Tvll^ E ||h^||| —)• \\w \\2 which is finite. 
Therefore, it follows that ll/iTvIlna E DwO- for a constant depending only 
on w. □ 


Proof of Lemma 4.5. We construct an e-net of piecewise polyno¬ 
mials over EI“ as in the proof of Lemma 2.3 in [44]. Let (d^k to be 
the 2A;th absolute moments of the spectral measure ^a-, i-e. fd^k ~ 
^n{2a^){2Ti\n\)‘^^. In [44], fd^k = but here we do not 
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have this simple scaling relationship and need to work with directly. 
Following the same construction in [44] but use we can obtain that 

' i=o ^ ^ 

where fc G N such that Ik\ < e for given e, (5 > 0. 

For any a > 0 and j > 0, applying Proposition 8.3 with x = a^, we get 

^ e“2“"4(2a^)n^'? < (27r)^^ max(a^^ 1). 

r} = — rvi ' 



we 


have 


< 


Choosing 5 = l/{167rmax(a, 1)}, 

8“-^Y^(4j)!/(2j)!/(j!). Using Stirling’s approximation with explicit bounds: 
y/^j.jn+i/ 2 g-n < 7 j! < for all positive integers n, we obtain for 

all j > 1, 


ji - (27r)3/4(2j)i+i/4e-iji+i/2e-i . 8i 


< ^^(27^)-3/42V4. 


When j 


0 , 



1. Therefore, we have a uniform bound for 


/j^j > 0- Let k ~ log(l/e), we then have log A^(2e,EI“, || • jjoo) < 
{1/5 + l)k\og{k/e) < max(a, l){log(l/e)}^, concluding the proof. □ 


Proof of Lemma 4.6. In view of Lemma 4.5, the proof follows the ar¬ 
gument in Lemma 4.6 in [46] by letting d = 2. □ 

Proof of Lemma 4.7. We need to show that if a < 6 and / G ValHIf, 
i-6-, ||/||h“ < \/0) then U/Ujib < By Lemma 4.3, it is sufficient to show 
that In{2a?) < cbe~‘^^^In{2dP‘) for any n > 0. Consider the function 

fn{x) = y/xe~^In{x). We only need to show that fn{V^)/fn{V^) < c. By 
Proposition 8.4, we have fn{V^)/fniV^) < 1 for n > 2. 

When n = 0, Proposition 8.4 indicates that fo{x) is increasing in x 
for X < 1/2. For x G [1/2, oo). Proposition 8.2 shows that fo{x) is 
bounded above and below since the function fo{x) is continuous, pos¬ 
itive and converges to l/\/^ > 0 as x —>■ oo (meaning that both 
/o(x) and l//o(x) are bounded above). In other words, there exists con¬ 
stants ci,C 2 > 0 such that fo{x) G (ci,C 2 ) for x G [1/2, oo). There¬ 
fore, if 6 < 1/8, we have /o(\/^) < /o(V^). If 6 > 1/8, we then have 
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/o(\/^)//o(\/^) < max{/o(l/2), C 2 }/ci < oo. Consequently, for a < b, we 
have /o(\/^) < niax{/o(l/2)/ci, C 2 /C 1 , l}/o(\/^). Similarly when n = 1, 
the function fi{x) is increasing in x for x < 3/2 and there exists two con¬ 
stants C 3 , C 4 > 0 such that /i(x) G ( 03 , C 4 ) for x G [3/2, 00 ). Consequently, we 
have /i(\/^) < max{/i( 3 / 2 )/c 3 , C 4 /C 3 , l}/i(\/^). We conclude the proof 
by letting c = max{/o(l/2)/ci, C 2 /C 1 , /i( 3 / 2 )/c 3 , C 4 /C 3 ,1}. □ 

Proof of Lemma 4.8. By Lemma 4.3, an element in Hf can be viewed 
as the real part of h{t) = where hn,a sat¬ 

isfies that X]^-oo l^n,ape“^“^In(2a^) < 1. Applying the Cauchy-Schwartz 
inequality twice, we have |/i(0)p < X]^-oo = 1 and \h{t) — 

/i(0)p < By Proposition 8.5, we have \h[t) — 

/i(0)p < Svr^a^t^. This concludes the proof. □ 

8. Modified Bessel function of the first kind. The modified Bessel 
function of the first kind are solutions to the modified Bessel’s equation [48]. 
Throughout the paper, we consider integer orders and positive argument, i.e. 
In{x) with n G Z and x > 0. We first introduce some basic properties of 
Inix) in Proposition 8.1, for easy reference. 

Proposition 8.1. The modified Bessel functions In{‘^x) has the follow¬ 
ing properties: 

(a) Generating functions. For x G M, 


OO 



z G C, z 0 


n=—oo 


(b) Symmetry about the order: In{‘^x) = I-n{‘2x) for x G M and n gTj. 

(c) For n > 0 and fixed x > 0, the following properties hold: 

(cl) Series representation: /n(2x) = x” x^'^7(j!(n-|-j)!). 

(c2) Ini^x) is positive and .strictly decreasing in n. 

(c3) 4(2x) < lQ{2x){2xY/n\. 

Proof. Properties (a), (b) and (cl) can be found in most literature on 
Bessel functions, e.g., see Chapter II of [48] and 8.51-8.52 of [23]. The pos¬ 
itivity of 1,1 (2x) follows its series representation. For (c2) and (c3), we let 
Xn(x) = /n-i-i(2x)//n(2x) where n > 0 and x > 0. Then by [1], we have 


2x 


< 1 


( 8 . 1 ) 


rn{x) < 


n + l/2 + (4x2 + {n + 1 / 2 ) 2 )V 2 
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leading to the monotonicity in (c2). Equation (8.1) also implies that rn{x) < 
2xj{n + 1) for all n > 0, and hence /n(2x)//o(2x) = 0^=0 < (2a:)"'/n!, 

concluding (c3). □ 


The estimate below is obtained when x —>■ oo with n being fixed or n —>■ oo 
in such a way that tends to a finite nonnegative number. 

Proposition 8.2. Let n e Z and —)> c for some constant c > 0 

as X ^ oo. Then y/xe~^In{x) —)• ( 27 r)“^/^e“Y as x ^ oo. 


Proof. The integral formula for the modified Bessel function of the first 
kind [48, page 181] implies that, for n G Z+, In{x) is 


1 

vr 



e^“®*cos(nt)dt 


- r^\^^°^^cos{nt)dt + - r e^“"*cos(nt)dt. 
Jo ^ Jn/2 


The second integral is bounded since cost < 0 for t G [7r/2,7r]. For 
the first integral, we set u = 2^/xsin{t/2), then we have In{x) = 
e®/(7rv^) f{u,x)du + 0{l), where 


_ u _ / 

fx [u) = e 2 cos I 2n arcsin 




- 1/2 

11(0 < u < V^). 


If nx~^^^ —)• c for a constant c > 0, we have fx{u) —cos(cu). Note 
that for any x > 0, we have \fx{u)\ < \/2e““ which is integrable. Accord¬ 
ing to the dominated convergence theorem, we obtain that 

poo 

e~^In{x)y/x —)■ TT~^ / cos{cu)du = (27r)“^/^e“'^ 

Jo 

where the last step uses the real part of the characteristic function of a 
standard normal. 

□ 


Proposition 8.3. For any x > 0 and j = 0,1, 2,..., we have 
^ e“^"'I„(2x)n^^ < ^^^max(x^l). 

n=—oo ^ 

Proof. Let Gj{x,z) = then by Proposition 8.1 (a), 

we have Gj{x,z) = Yl^=-oo^n{‘^x)z^^'^L When j = 0, we thus have 
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J2’?^=-oo ® = 1, leading to the statement of the proposition. 

For j > 1, we first take the 2jth partial derivatives of G 2 j{x, z) and obtain 


d^^Gj{x,z) 


OO 

^ In{2x){n + 2j)2jz"‘, 

n=—oo 


where (n + 2 j) 2 j = (n + 2j) • (n + 2j — 1) • • • • (n + 1) is the descending 
factorial. It is easy to see that for n > 0, (n + 2j)2j > for n G [—2j, —1], 
{n + 2 j) 2 j = 0; for n < -2j, {n + 2 j) 2 j = (-l)^^(-n - 2j) • • • (-n- 1) > 0. 
Therefore, we have for z > 0, 


( 8 . 2 ) 

Let Fj{x) 


d^^Gj{x,z) 

dz‘^^ 


OO OO 

n=0 n=l 


d‘^^Gj{x,z) 


z=l 


then equation (8.2) implies that 


(8.3) 


OO OO 

e-^^In{2x)7?^ = 2^e"^^/„(2x)n2^' < 2e-‘^^Fj{x). 

n=—oo n=l 


To bound Fj{x), consider Hj{x, z) = log{Gj(x, z)} = 2j \ogz + x{z + z ^). 
By direct calculations, the pth order derivative of Hj{x, z) is given by 

dPF[j{x,z)_( 2jz~^ + x — xz~^, if p = 1, 

dzP \ {—iyp\{—2jp~^z + x)z~^P~^^\ if p > 2. 

Applying the Faa di Bruno’s formula, we have 

fo/I'l d‘^^Gj{x,z) {2j)\ Hi(x,z) fTfd^Hj{x,z) 1 

K 2=1 

where the sum is over the set K = {{ki ,..., k 2 j) ■ YliLi '^ki = 2j, ki G {0} U 
Z+l. Plugging in the expression of dPHj{x, z^jdz^ and z = 1, equation (8.4) 
leads to 



F,(x) 


d‘^^Gj{x, z) 




Z=1 



E 

K 

E 


(2j)! 

kil---k2jl 


e2-(-l)^Si 







ki 



(2jf^ 
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where the last step follows because YllLi = 2j which is even. 

Noting that 2j/i > 1 for i = 2,..., 2j, it is easy to verify that | —2j/i+x\ < 
max(x, l)2j ji for a: > 0. Consequently, we have 


|r,WI<E - 


2j 


JC 


ki\---k2j\ 


]J{max(x, 1)}^ 


i=2 


(2j) 


ki 


(8.5) 




2i 


K. 


ki\---k2j\ 


{max(x, l)}^i =2 


2=1 


Furthermore, for [ki ,..., k 2 j) G /C, we have 2 ki < Yl‘iL 2 '^ki = 2j—ki < 
2j, and hence Yl‘iL 2 ki < j- Consequently, equation (8.5) leads to 


\Fj{x)\ < max{x\ 1) ^ /.(Ij ' 11 =: max(rE^ l)^ 2 i- 

To estimate 742^, let /C^ = /C n {{ki,..., k 2 j) ■ Y^fLiki = k}. 

We then have A 2 j = Efcii(2j)^ Eye, (2j)!/(^i!''' ^2^0 IlEi = 

Efc=i( 2 j)^- 62 j,fc( 0 !, l!,.. •), where i?2Tfc(0!, 1!, • • •) is the so-called Bell poly¬ 
nomials evaluated at (0!,1!,...) and is equal to the unsigned Stirling 
number of the first kind \s{2j,k)\ [Theorems A and B, page 133-134 
in 10]. Therefore, we have A 2 j = Efc=i l'5(2j) ^)|(2j)^, which is equal 
to {2j)(2j + 1) ■ ■ ■ {2j -|- 2j — 1) = (4j)!/{2(2j)!} according to the gen¬ 
erating function of |s(2j, A:)| [equation (5f), page 213 in 10]. Therefore, 
\Fj{x)\ < max(x'^, l)(4j)!/{2(2j)!}. Combining equation (8.3), this yields 
the bound given in the statement of the proposition. □ 


Proposition 8.4. The function fn{x) = y/xe ^In{x) is increasing in x 
when x G Bn, where Bn = [0, n -|- 1/2] if n = 0,1 and Bn = [0, oo) if n>2. 

Proof. For given n > 0, let Qnix) = log fn{x) = (logx)/2 — x-|-login (x). 
Then g'^{x) = l/(2x) - 1 -I I(j(x)/In(x). Let rn(x) = In+i(x)/In(x), then 
In(x)/In(x) = rn(x) + njx (equation (8) in [1]). Therefore, the increasing 
property of fnix) follows if we show that l/(2x) -1-1- rn{x) + n/x > 0, or 
equivalently rn(x) > 1 — (n -|- l/2)/x. 

Since rn(x) > 0 for n > 0 and x > 0, fn{x) is thus increasing in x G 
[0, n -|- 1/2] for any n > 0. When n > 2, we shall use the lower bound for 
rn(x) given in [1], i.e. rn(x) > x/(n-|-l/2-|-{x^-|-(n-|-3/2)^}^/^) when x > 0. 
Let t = n-|-l/2. Then it is sufficient to show that x/(t-|-{x^-|-(f-|-l)^}^/^) > 
1 — t/x for X > t > 5/2. We rewrite this inequality as x^ — (x — t)t > 
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(x— t) which is simplified as > (x— f)^(2f+l) by algebra. 

It follows from the observation that when x>t>5/2, we have x > x — t > 0 
and > 2t + 1. Therefore, fnix) is increasing in x if n > 2. □ 

Proposition 8.5. For any x >0, we have Yl’^=-oo In{‘^x)n^ = 2x. 


Proof. Let G 2 (x,z) = A direct calculation leads to the 

relations dG 2 {x, z)jdz = ^)(xz^ — x + 2z) and d‘^G 2 {x, z)ldz‘^ = 

^x{z+z )^(^x — xz~‘^){xz'^ — x + 2z) + 2xz + 2}. On the other hand, by Propo¬ 
sition 8.1 (a), we have G 2 {x,z) = Yl’^=-oo^n{‘^x)z'^~^‘^. We take derivatives 
at the right hand side term by term and obtain that 


OO 

^ e“^^/n(2x)n^ 
n=—oo 


d^G2(x,l) 

dz"^ 


-3 


dG2{x, 1 ) 

dz 


+ 4G2(x, 1) = 2x, 


by the expression of dG 2 {x, z)/dz, d‘^G 2 {x, z) jdz^^ at z = 1. 


□ 
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